364 research outputs found

    Distributed learning of CNNs on heterogeneous CPU/GPU architectures

    Get PDF
    Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from 6060-9090\% of global processing time. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and 500500 and 15001500 kernels, respectively, best speedups achieve 3.28Ă—3.28\times using four CPUs and 2.45Ă—2.45\times with three GPUs. Modern imaging datasets, larger and more complex than CIFAR-10 will certainly require more than 6060-9090\% of processing time calculating convolutions, and speedups will tend to increase accordingly

    A logic for n-dimensional hierarchical refinement

    Full text link
    Hierarchical transition systems provide a popular mathematical structure to represent state-based software applications in which different layers of abstraction are represented by inter-related state machines. The decomposition of high level states into inner sub-states, and of their transitions into inner sub-transitions is common refinement procedure adopted in a number of specification formalisms. This paper introduces a hybrid modal logic for k-layered transition systems, its first-order standard translation, a notion of bisimulation, and a modal invariance result. Layered and hierarchical notions of refinement are also discussed in this setting.Comment: In Proceedings Refine'15, arXiv:1606.0134

    Distribution-Based Categorization of Classifier Transfer Learning

    Get PDF
    Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained significant interest in the Machine Learning community since it paves the way to devise intelligent learning models that can easily be tailored to many different applications. As it is natural in a fast evolving area, a wide variety of TL methods, settings and nomenclature have been proposed so far. However, a wide range of works have been reporting different names for the same concepts. This concept and terminology mixture contribute however to obscure the TL field, hindering its proper consideration. In this paper we present a review of the literature on the majority of classification TL methods, and also a distribution-based categorization of TL with a common nomenclature suitable to classification problems. Under this perspective three main TL categories are presented, discussed and illustrated with examples

    Improving Makespan in Dynamic Task Scheculing for Cloud Robotic Systems with Time Window Constraints

    Full text link
    A scheduling method in a robotic network cloud system with minimal makespan is beneficial as the system can complete all the tasks assigned to it in the fastest way. Robotic network cloud systems can be translated into graphs where nodes represent hardware with independent computing power and edges represent data transmissions between nodes. Time-window constraints on tasks are a natural way to order tasks. The makespan is the maximum amount of time between when a node starts executing its first scheduled task and when all nodes have completed their last scheduled task. Load balancing allocation and scheduling ensures that the time between when the first node completes its scheduled tasks and when all other nodes complete their scheduled tasks is as short as possible. We propose a new load balancing algorithm for task allocation and scheduling with minimal makespan. We theoretically prove the correctness of the proposed algorithm and present simulations illustrating the obtained results.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Contrastive Learning from Demonstrations

    Full text link
    This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations

    Population-based JPEG Image Compression: Problem Re-Formulation

    Full text link
    The JPEG standard is widely used in different image processing applications. One of the main components of the JPEG standard is the quantisation table (QT) since it plays a vital role in the image properties such as image quality and file size. In recent years, several efforts based on population-based metaheuristic (PBMH) algorithms have been performed to find the proper QT(s) for a specific image, although they do not take into consideration the user's opinion. Take an android developer as an example, who prefers a small-size image, while the optimisation process results in a high-quality image, leading to a huge file size. Another pitfall of the current works is a lack of comprehensive coverage, meaning that the QT(s) can not provide all possible combinations of file size and quality. Therefore, this paper aims to propose three distinct contributions. First, to include the user's opinion in the compression process, the file size of the output image can be controlled by a user in advance. Second, to tackle the lack of comprehensive coverage, we suggest a novel representation. Our proposed representation can not only provide more comprehensive coverage but also find the proper value for the quality factor for a specific image without any background knowledge. Both changes in representation and objective function are independent of the search strategies and can be used with any type of population-based metaheuristic (PBMH) algorithm. Therefore, as the third contribution, we also provide a comprehensive benchmark on 22 state-of-the-art and recently-introduced PBMH algorithms on our new formulation of JPEG image compression. Our extensive experiments on different benchmark images and in terms of different criteria show that our novel formulation for JPEG image compression can work effectively.Comment: 39 pages, this paper is submitted to the related journa
    • …
    corecore